Modeling and Quality of Masked Microdata

نویسنده

  • William E. Winkler
چکیده

Statistical organizations collect data via survey forms and other methods. The microdata are valuable for modeling and analysis. To produce a public-use file, the organizations mask the data in a manner that may prevent re-identification of data associated with individual entities. The public-use microdata may allow one or two sets of analyses that approximately reproduce analyses that could be performed on the original microdata. This paper describes a general method of creating models of data that is related to methods of creating appropriate aggregates of data that are needed for sufficient statistics in general classes of models (Moore and Lee 1998, DuMouchel et al. 2000, Owen 2003). If the aggregates can be approximately reproduced, then the masked microdata may allow one or more analyses that correspond to analyses on the original, non-public microdata. It will typically not yield data suitable for general analyses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Global Measures of Data Utility for Microdata Masked for Disclosure Limitation

When releasing microdata to the public, data disseminators typically alter the original data to protect the confidentiality of database subjects’ identities and sensitive attributes. However, such alteration negatively impacts the utility (quality) of the released data. In this paper, we present quantitative measures of data utility for masked microdata, with the aim of improving disseminators’...

متن کامل

Automatic Generation of Masked Microdata

Disclosure Control is the discipline concerned with the modification of data containing confidential information about individual entities, such as persons, households, businesses, etc. in order to prevent third parties working with these data from recognizing entities in the data and thereby disclosing information about these entities. In very broad terms, disclosure risk is the risk that a gi...

متن کامل

Reverse Mapping to Preserve the Marginal Distributions of Attributes in Masked Microdata

In this paper we describe a new procedure that is capable of ensuring that the marginal distributions of attributes in microdata masked with a masking mechanism end up being the same as the marginal distributions of attributes in the original data. We illustrate the application of the new procedure using several commonly used masking mechanisms.

متن کامل

P-Sensitive K-Anonymity with Generalization Constraints

Numerous privacy models based on the k‐anonymity property and extending the k‐anonymity model have been introduced in the last few years in data privacy re‐ search: l‐diversity, p‐sensitive k‐anonymity, (α, k) – anonymity, t‐closeness, etc. While differing in their methods and quality of their results, they all focus first on masking the data, and then protecting the quality of the data as a wh...

متن کامل

Controlled shuffling, statistical confidentiality and microdata utility: a successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-International database

IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005